Tibbles are similar to data frames with a few tricks up their sleave. One is that you can use so called list columns. So instead of having datasets (or LC-MS raw data) in a list or in different variables it can be in one table with associated meta data.
Lets take some raw data data and make it into a table. We will do this in the pipe/dplyr because that is much more readable. First we load some packages we will need in this tutorial.
library(MetabolomiQCsR)
library(purrr)
library(tidyr)
library(dplyr)
library(faahKO)
library(magrittr)
library(ggplot2)
library(plotly)
library(knitr)Now we first find the path to some test data:
files <- file.path(find.package("faahKO"), "cdf/KO") %>%
list.files(full.names=TRUE)
files## [1] "C:/Users/JPZS/Documents/R/win-library/3.3/faahKO/cdf/KO/ko15.CDF"
## [2] "C:/Users/JPZS/Documents/R/win-library/3.3/faahKO/cdf/KO/ko16.CDF"
## [3] "C:/Users/JPZS/Documents/R/win-library/3.3/faahKO/cdf/KO/ko18.CDF"
## [4] "C:/Users/JPZS/Documents/R/win-library/3.3/faahKO/cdf/KO/ko19.CDF"
## [5] "C:/Users/JPZS/Documents/R/win-library/3.3/faahKO/cdf/KO/ko21.CDF"
## [6] "C:/Users/JPZS/Documents/R/win-library/3.3/faahKO/cdf/KO/ko22.CDF"
Then we read the raw data with a wrapper that puts it into a table.
raw_tbl <- files %>% xcmsRaw_to_tbl
raw_tbl## # A tibble: 6 × 4
## file polarity raw
## <fctr> <chr> <list>
## 1 ko15.CDF unknown <S4: xcmsRaw>
## 2 ko16.CDF unknown <S4: xcmsRaw>
## 3 ko18.CDF unknown <S4: xcmsRaw>
## 4 ko19.CDF unknown <S4: xcmsRaw>
## 5 ko21.CDF unknown <S4: xcmsRaw>
## 6 ko22.CDF unknown <S4: xcmsRaw>
## # ... with 1 more variables: path <fctr>
A single raw object could be accessed like this:
raw_tbl$raw[[1]]## An "xcmsRaw" object with 1278 mass spectra
##
## Time range: 2501.4-4499.8 seconds (41.7-75 minutes)
## Mass range: 200-600 m/z
## Intensity range: 70-1373180
##
## MSn data on 0 mass(es)
## with 0 MSn spectra
## Profile method: bin
## Profile step: 1 m/z (401 grid points from 200 to 600 m/z)
##
## Memory usage: 10.7 MB
To get a TIC we can use get_EICs with m/z range from -Inf to Inf. Lets create a TIC for one of the raw files and plot it with plot_chrom:
TIC_tbl <- raw_tbl$raw[[1]] %>%
get_EICs(range_tbl = data.frame(mz_lower = -Inf,mz_upper = Inf)) %>%
extract2(1) # since the function can work on multiple ranges we get a list.
# So we take first element.
plot_chrom(TIC_tbl, RT_col = "scan_rt", Intensity_col = "intensity")We can also get the BPI instead.
BPI_tbl <- raw_tbl$raw[[1]] %>%
get_EICs(range_tbl = data.frame(mz_lower = -Inf,mz_upper = Inf), BPI = TRUE) %>%
extract2(1)
plot_chrom(BPI_tbl, RT_col = "scan_rt", Intensity_col = "intensity")If there is a contaminant (or a continuous calibrant) you can exclude it. In this case there is no such thing in the data so for illustrative purposes we can remove a peak from the TIC (see if you can find it!).
TIC_tbl <- raw_tbl$raw[[1]] %>%
get_EICs(range_tbl = data.frame(mz_lower = -Inf, mz_upper = Inf),
exclude_mz = 279.000,
exclude_ppm = 1000
) %>%
extract2(1)
plot_chrom(TIC_tbl, RT_col = "scan_rt", Intensity_col = "intensity")Now lets do an EIC instead of a TIC. We can look at the mass we excluded above. We again take out just a single raw file. First we need to get the m/z range of the EIC slice.
ppm <- 1000 # this is not accurate mass data
EIC_intervals <- data_frame(mz=279.000,
mz_lower = mz-(ppm/1E6)*mz,
mz_upper = mz+(ppm/1E6)*mz
)
EIC_intervals## # A tibble: 1 × 3
## mz mz_lower mz_upper
## <dbl> <dbl> <dbl>
## 1 279 278.721 279.279
Now we get the EIC data and plot it.
EIC_data <- get_EICs(raw_tbl$raw[[1]], EIC_intervals) %>% extract2(1)
EIC_data## # A tibble: 1,278 × 3
## scan intensity scan_rt
## <int> <dbl> <dbl>
## 1 1 2598 41.68963
## 2 2 2515 41.71572
## 3 3 2452 41.74180
## 4 4 2525 41.76788
## 5 5 2717 41.79397
## 6 6 2805 41.82005
## 7 7 2745 41.84613
## 8 8 2644 41.87222
## 9 9 2727 41.89830
## 10 10 2972 41.92438
## # ... with 1,268 more rows
p <- plot_chrom(EIC_data, RT_col = "scan_rt", Intensity_col = "intensity")
pAs a side note we can also get interactive plots with plotly:
p %>% ggplotly %>% layout(margin = list(l = 80, b = 60))Now lets get many EICs at the same time for all the files. This might be a bit hairy but I will try to explain each step.
First we make a table of ranges:
ppm <- 1000 # this is not accurate mass data. Used to create intervals below.
range_tbls <- data_frame(mz = c(508.1, 279.0, 577.3)) %>%
mutate(mz_lower = mz-((ppm)/1E6)*mz,
mz_upper = mz+((ppm)/1E6)*mz
)
range_tbls## # A tibble: 3 × 3
## mz mz_lower mz_upper
## <dbl> <dbl> <dbl>
## 1 508.1 507.5919 508.6081
## 2 279.0 278.7210 279.2790
## 3 577.3 576.7227 577.8773
Then we need to make a copy of the table for each of the files. This is so that they can go in the table together.
range_tbls %<>% list %>%
rep(nrow(raw_tbl)) %>%
data_frame(ranges = .)
range_tbls## # A tibble: 6 × 1
## ranges
## <list>
## 1 <tibble [3 × 3]>
## 2 <tibble [3 × 3]>
## 3 <tibble [3 × 3]>
## 4 <tibble [3 × 3]>
## 5 <tibble [3 × 3]>
## 6 <tibble [3 × 3]>
At this point we are ready to merge the two tables.
range_tbls_and_files <- bind_cols(raw_tbl, range_tbls)
range_tbls_and_files %>% select(-path) # In this display we remove the path column just to better show the relevant data.## # A tibble: 6 × 4
## file polarity raw ranges
## <fctr> <chr> <list> <list>
## 1 ko15.CDF unknown <S4: xcmsRaw> <tibble [3 × 3]>
## 2 ko16.CDF unknown <S4: xcmsRaw> <tibble [3 × 3]>
## 3 ko18.CDF unknown <S4: xcmsRaw> <tibble [3 × 3]>
## 4 ko19.CDF unknown <S4: xcmsRaw> <tibble [3 × 3]>
## 5 ko21.CDF unknown <S4: xcmsRaw> <tibble [3 × 3]>
## 6 ko22.CDF unknown <S4: xcmsRaw> <tibble [3 × 3]>
Now the FUN begins. We use map2 from the purrr to go through each row of the table and run get_EICs on each combination of the raw and ranges columns.
range_tbls_and_files %<>% mutate(EIC = map2(raw,ranges, get_EICs ))
range_tbls_and_files %>% select(-path)## # A tibble: 6 × 5
## file polarity raw ranges EIC
## <fctr> <chr> <list> <list> <list>
## 1 ko15.CDF unknown <S4: xcmsRaw> <tibble [3 × 3]> <list [3]>
## 2 ko16.CDF unknown <S4: xcmsRaw> <tibble [3 × 3]> <list [3]>
## 3 ko18.CDF unknown <S4: xcmsRaw> <tibble [3 × 3]> <list [3]>
## 4 ko19.CDF unknown <S4: xcmsRaw> <tibble [3 × 3]> <list [3]>
## 5 ko21.CDF unknown <S4: xcmsRaw> <tibble [3 × 3]> <list [3]>
## 6 ko22.CDF unknown <S4: xcmsRaw> <tibble [3 × 3]> <list [3]>
# Inside each of the EIC lists we have a table for each EIC slice:
range_tbls_and_files$EIC[[1]]## [[1]]
## # A tibble: 1,278 × 3
## scan intensity scan_rt
## <int> <dbl> <dbl>
## 1 1 620 41.68963
## 2 2 439 41.71572
## 3 3 459 41.74180
## 4 4 446 41.76788
## 5 5 413 41.79397
## 6 6 428 41.82005
## 7 7 559 41.84613
## 8 8 693 41.87222
## 9 9 751 41.89830
## 10 10 684 41.92438
## # ... with 1,268 more rows
##
## [[2]]
## # A tibble: 1,278 × 3
## scan intensity scan_rt
## <int> <dbl> <dbl>
## 1 1 2598 41.68963
## 2 2 2515 41.71572
## 3 3 2452 41.74180
## 4 4 2525 41.76788
## 5 5 2717 41.79397
## 6 6 2805 41.82005
## 7 7 2745 41.84613
## 8 8 2644 41.87222
## 9 9 2727 41.89830
## 10 10 2972 41.92438
## # ... with 1,268 more rows
##
## [[3]]
## # A tibble: 1,278 × 3
## scan intensity scan_rt
## <int> <dbl> <dbl>
## 1 1 1356 41.68963
## 2 2 1473 41.71572
## 3 3 826 41.74180
## 4 4 751 41.76788
## 5 5 1131 41.79397
## 6 6 941 41.82005
## 7 7 918 41.84613
## 8 8 563 41.87222
## 9 9 592 41.89830
## 10 10 605 41.92438
## # ... with 1,268 more rows
So now we need to wrangle the data a bit to be able to unwrap those nested tables. We go into the EIC lists, merge the lists while we add a row to show which m/z slice each belongs to. While we are at it we change the new mz column to a factor.
range_tbls_and_files %<>% mutate(EIC = map2(EIC,ranges, ~ bind_rows(.x %>% setNames(.y$mz), .id = "mz") %>%
mutate(mz = mz %>% as.numeric %>% as.factor)
)
)
range_tbls_and_files$EIC[[1]]## # A tibble: 3,834 × 4
## mz scan intensity scan_rt
## <fctr> <int> <dbl> <dbl>
## 1 508.1 1 620 41.68963
## 2 508.1 2 439 41.71572
## 3 508.1 3 459 41.74180
## 4 508.1 4 446 41.76788
## 5 508.1 5 413 41.79397
## 6 508.1 6 428 41.82005
## 7 508.1 7 559 41.84613
## 8 508.1 8 693 41.87222
## 9 508.1 9 751 41.89830
## 10 508.1 10 684 41.92438
## # ... with 3,824 more rows
So now we can get rid of those nested tables. First we remove the ranges and the raw columns because we cannot unnest everything together.
range_tbls_and_files %<>% select(-ranges, -raw) %>% unnestNow we can even plot all the EICs for all the files at the same time of we want.
p <- range_tbls_and_files %>%
plot_chrom(RT_col = "scan_rt", Intensity_col = "intensity") +
facet_grid(file ~ mz) # since plot_chrom gives a ggplot2 object
# we can continue manipulating it.
pYou can also generate a separate plot for each file/mz combination very easily:
# Nest the table again by file/mz
range_tbls_and_files %<>% group_by(file, mz, path) %>% nest(.key = "EIC")
range_tbls_and_files## # A tibble: 18 × 4
## file mz
## <fctr> <fctr>
## 1 ko15.CDF 508.1
## 2 ko15.CDF 279
## 3 ko15.CDF 577.3
## 4 ko16.CDF 508.1
## 5 ko16.CDF 279
## 6 ko16.CDF 577.3
## 7 ko18.CDF 508.1
## 8 ko18.CDF 279
## 9 ko18.CDF 577.3
## 10 ko19.CDF 508.1
## 11 ko19.CDF 279
## 12 ko19.CDF 577.3
## 13 ko21.CDF 508.1
## 14 ko21.CDF 279
## 15 ko21.CDF 577.3
## 16 ko22.CDF 508.1
## 17 ko22.CDF 279
## 18 ko22.CDF 577.3
## # ... with 2 more variables: path <fctr>, EIC <list>
# now make the plots
range_tbls_and_files %<>% mutate(plot = map(EIC, ~ plot_chrom(.x, RT_col = "scan_rt", Intensity_col = "intensity")))
range_tbls_and_files## # A tibble: 18 × 5
## file mz
## <fctr> <fctr>
## 1 ko15.CDF 508.1
## 2 ko15.CDF 279
## 3 ko15.CDF 577.3
## 4 ko16.CDF 508.1
## 5 ko16.CDF 279
## 6 ko16.CDF 577.3
## 7 ko18.CDF 508.1
## 8 ko18.CDF 279
## 9 ko18.CDF 577.3
## 10 ko19.CDF 508.1
## 11 ko19.CDF 279
## 12 ko19.CDF 577.3
## 13 ko21.CDF 508.1
## 14 ko21.CDF 279
## 15 ko21.CDF 577.3
## 16 ko22.CDF 508.1
## 17 ko22.CDF 279
## 18 ko22.CDF 577.3
## # ... with 3 more variables: path <fctr>, EIC <list>, plot <list>
range_tbls_and_files$plot[[1]]This probably only makes sense if you are look for the max of a peak or if you have a contaminant and you want to know the median intensity etc.
EIC_summary <- range_tbls_and_files %>%
select(-plot) %>% # can't unnest the plot and EIC at the same time
unnest %>%
group_by(file, mz, path) %>%
summarise(EIC_median = median(intensity),
EIC_mean = mean(intensity),
EIC_sd = sd(intensity),
EIC_max = max(intensity)
)
EIC_summary %>% select(-path) %>% kable # kable is just to show a nice table below instead of the print display| file | mz | EIC_median | EIC_mean | EIC_sd | EIC_max |
|---|---|---|---|---|---|
| ko15.CDF | 279 | 2268.5 | 13566.8670 | 72950.8749 | 805248 |
| ko15.CDF | 508.1 | 1479.0 | 34870.9906 | 158436.2794 | 1298944 |
| ko15.CDF | 577.3 | 1103.5 | 8531.4108 | 38677.2575 | 457024 |
| ko16.CDF | 279 | 1994.0 | 11381.0313 | 52718.6966 | 547904 |
| ko16.CDF | 508.1 | 1731.5 | 34384.7128 | 143008.7030 | 1115648 |
| ko16.CDF | 577.3 | 1002.5 | 4645.3091 | 16952.8447 | 143744 |
| ko18.CDF | 279 | 2241.0 | 12305.5156 | 61206.9076 | 632896 |
| ko18.CDF | 508.1 | 2963.0 | 31214.2121 | 125154.1095 | 1039296 |
| ko18.CDF | 577.3 | 696.5 | 2240.9390 | 5490.7539 | 28512 |
| ko19.CDF | 279 | 1748.5 | 8695.0156 | 39067.6209 | 361536 |
| ko19.CDF | 508.1 | 3014.5 | 19624.6487 | 79410.1471 | 695552 |
| ko19.CDF | 577.3 | 755.0 | 761.6150 | 447.4914 | 3365 |
| ko21.CDF | 279 | 1677.5 | 10570.1776 | 49061.6622 | 483776 |
| ko21.CDF | 508.1 | 1756.0 | 20184.8232 | 81555.5366 | 687680 |
| ko21.CDF | 577.3 | 877.5 | 982.4225 | 678.2650 | 3994 |
| ko22.CDF | 279 | 1701.5 | 8992.5853 | 44103.5943 | 465600 |
| ko22.CDF | 508.1 | 1689.5 | 17618.2261 | 72745.8346 | 627712 |
| ko22.CDF | 577.3 | 893.5 | 953.6995 | 603.1006 | 2528 |